In visual effects, match moving is a technique that allows the insertion of 2D elements, other live action elements or CG computer graphics into live-action footage with correct position, scale, orientation, and motion relative to the photographed objects in the shot. It also allows for the removal of live action elements from the live action shot. The term is used loosely to describe several different methods of extracting camera motion information from a motion picture. Also referred to as motion tracking or camera solving, match moving is related to rotoscoping and photogrammetry. Match moving is sometimes confused with motion capture, which records the motion of objects, often human actors, rather than the camera. Typically, motion capture requires special cameras and sensors and a controlled environment (although recent developments such as the Kinect camera and Apple's Face ID have begun to change this). Match moving is also distinct from motion control photography, which uses mechanical hardware to execute multiple identical camera moves. Match moving, by contrast, is typically a software-based technology, Post-production to normal footage recorded in uncontrolled environments with an ordinary camera.
Match moving is primarily used to track the movement of a camera through a shot so that an identical virtual camera move can be reproduced in a 3D animation program. When new animated elements are composited back into the original live-action shot, they will appear in perfectly matched perspective and therefore appear seamless.
As it is mostly software-based, match moving has become increasingly affordable as the cost of computer power has declined; it is now an established visual-effects tool and is even used in live television broadcasts as part of providing effects such as the yellow virtual down-line in American football.
When a point on the surface of a three-dimensional object is photographed, its position in the 2-D frame can be calculated by a 3-D projection function. We can consider a camera to be an abstraction that holds all the parameters necessary to model a camera in a real or virtual world. Therefore, a camera is a vector that includes as its elements the position of the camera, its orientation, focal length, and other possible parameters that define how the camera focuses light onto the film plane. Exactly how this vector is constructed is not important as long as there is a compatible projection function P.
The projection function P takes as its input a camera vector (denoted camera) and another vector the position of a 3-D point in space (denoted xyz) and returns a 2D point that has been projected onto a plane in front of the camera (denoted XY). We can express this:
The projection function transforms the 3-D point and strips away the component of depth. Without knowing the depth of the component an inverse projection function can only return a set of possible 3D points, that form a line emanating from the nodal point of the camera lens and passing through the projected 2-D point. We can express the inverse projection as:
or
Let's say we are in a situation where the features we are tracking are on the surface of a rigid object such as a building. Since we know that the real point xyz will remain in the same place in real space from one frame of the image to the next we can make the point a constant even though we do not know where it is. So:
where the subscripts i and j refer to arbitrary frames in the shot we are analyzing. Since this is always true then we know that:
Because the value of XY i has been determined for all frames that the feature is tracked through by the tracking program, we can solve the reverse projection function between any two frames as long as P'( camera i, XY i) ∩ P'( camera j, XY j) is a small set. Set of possible camera vectors that solve the equation at i and j (denoted C ij).
So there is a set of camera vector pairs C ij for which the intersection of the inverse projections of two points XY i and XY j is a non-empty, hopefully small, set centering on a theoretical stationary point xyz .
In other words, imagine a black point floating in a white void and a camera. For any position in space that we place the camera, there is a set of corresponding parameters (orientation, focal length, etc.) that will photograph that black point exactly the same way. Since C has an infinite number of members, one point is never enough to determine the actual camera position.
As we start adding tracking points, we can narrow the possible camera positions. For example, if we have a set of points { xyz i,0,..., xyz i,n} and { xyz j,0,..., xyz j,n} where i and j still refer to frames and n is an index to one of many tracking points we are following. We can derive a set of camera vector pair sets {C i,j,0,...,C i,j,n}.
In this way multiple tracks allow us to narrow the possible camera parameters. The set of possible camera parameters that fit, F, is the intersection of all sets:
The fewer elements are in this set the closer we can come to extracting the actual parameters of the camera. In reality, errors introduced into the tracking process require a more statistical approach to determining a good camera vector for each frame; optimization algorithms and bundle block adjustment are often utilized. Unfortunately there are so many elements to a camera vector that when every parameter is free we still might not be able to narrow F down to a single possibility no matter how many features we track. The more we can restrict the various parameters, especially focal length, the easier it becomes to pinpoint the solution.
In all, the 3D solving process is the process of narrowing down the possible solutions to the motion of the camera until we reach one that suits the needs of the composite we are trying to create.
A reconstruction program can create three-dimensional objects that mimic the real objects from the photographed scene. Using data from the point cloud and the user's estimation, the program can create a virtual object and then extract a texture from the footage that can be projected onto the virtual object as a surface texture.
Three-dimensional match moving tools make it possible to extrapolate three-dimensional information from two-dimensional photography. These tools allow users to derive camera movement and other relative motion from arbitrary footage. The tracking information can be transferred to computer graphics software and used to animate virtual cameras and simulated objects. Programs capable of 3-D match moving include:
The advantage of automatic tracking is that the computer can create many points faster than a human can. A large number of points can be analyzed with statistics to determine the most reliable data. The disadvantage of automatic tracking is that, depending on the algorithm, the computer can be easily confused as it tracks objects through the scene. Automatic tracking methods are particularly ineffective in shots involving fast camera motion such as that seen with hand-held camera work and in shots with repetitive subject matter like small tiles or any sort of regular pattern where one area is not very distinct. This tracking method also suffers when a shot contains a large amount of motion blur, making the small details it needs harder to distinguish.
The advantage of interactive tracking is that a human user can follow features through an entire scene and will not be confused by features that are not rigid. A human user can also determine where features are in a shot that suffers from motion blur; it is extremely difficult for an automatic tracker to correctly find features with high amounts of motion blur. The disadvantage of interactive tracking is that the user will inevitably introduce small errors as they follow objects through the scene, which can lead to what is called "drift".
Professional-level motion tracking is usually achieved using a combination of interactive and automatic techniques. An artist can remove points that are clearly anomalous and use "tracking mattes" to block confusing information out of the automatic tracking process. Tracking mattes are also employed to cover areas of the shot which contain moving elements such as an actor or a spinning ceiling fan.
Most match moving applications are based on similar algorithms for tracking and calibration. Often, the initial results obtained are similar. However, each program has different refining capabilities.
To achieve this, a number of components from hardware to software need to be combined. Software collects all of the 360 degrees of freedom movement of the camera as well as metadata such as zoom, focus, iris and shutter elements from many different types of hardware devices, ranging from motion capture systems such as active LED marker based system from PhaseSpace, passive systems such as Motion Analysis or Vicon, to rotary encoders fitted to camera cranes and dollies such as Technocranes and Fisher Dollies, or inertia & gyroscopic sensors mounted directly to the camera. There are also laser based tracking systems that can be attached to anything, including Steadicams, to track cameras outside in the rain at distances of up to 30 meters.
Motion control cameras can also be used as a source or destination for 3D camera data. Camera moves can be pre-visualised in advance and then converted into motion control data that drives a camera crane along precisely the same path as the 3-D camera. Encoders on the crane can also be used in real time on-set to reverse this process to generate live 3D cameras. The data can be sent to any number of different 3D applications, allowing 3D artists to modify their CGI elements live on set as well. The main advantage being that set design issues that would be time-consuming and costly issues later down the line can be sorted out during the shooting process, ensuring that the actors "fit" within each environment for each shot whilst they do their performances.
Real time motion capture systems can also be mixed within camera data stream allowing virtual characters to be inserted into live shots on-set. This dramatically improves the interaction between real and non-real MoCap driven characters as both plate and CGI performances can be choreographed together.
|
|